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RECOVERING A TREE FROM THE LENGTHS OF SUBTREES 
SPANNED BY A RANDOMLY GHOSEN SEQUENGE OF LEAVES 


STEVEN N. EVANS AND DANIEL LANOUE 


Abstract. Given an edge-weighted tree T with n leaves, sample the leaves 
uniformly at random without replacement and let 2 < fc < n, be the 

length of the subtree spanned by the first k leaves. We consider the question, 
“Can T be identified (up to isomorphism) by the joint probability distribution 
of the random vector (W 2 ,..., We show that if T is known a priori 

to belong to one of various families of edge-weighted trees, then the answer 
is, “Yes.” These families include the edge-weighted trees with edge-weights 
in general position, the ultrametric edge-weighted trees, and certain families 
with equal weights on all edges such as (k + l)-valent and rooted fc-ary trees 
for k > 2 and caterpillars. 
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1. Introduction 

1.1. Background and motivation. What features of an edge-weighted tree iden¬ 
tify it uniquely up to isomorphism, perhaps within some class of such trees? Here 
an edge-weighted tree is a connected, acyclic finite graph T with vertex set V(T) 
and edge set E(T) which is equipped with a function Wx : E(T) —>• K++ := (0, oo). 
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The value of WT(e) for an edge e G E(T) is called the weight or the length of e. 
Two such trees T' and T" are isomorphic if there is a bijection a : V(T') — )■ V(T") 
such that: 

• {u,v} G E(T') if and only if {a{u),a{v)} G E(T"), 

• WT'({w,r'}) = Wt"({o'(m), cr(i;)}) for all {m,u} G E(T'). 

The question above is, more formally, one of asking for a given class of edge- 
weighted trees T about the possible sets U and functions $ : T —>■ U such that for 
all T',T" G T we have $(T') = $(T") if and only if T' and T" are isomorphic. 
If the class T consists of edge-weighted trees for which all edges have length 1 (we 
will call such objects combinatorial trees for the sake of emphasis), then determin¬ 
ing whether two trees in T are isomorphic is just a particular case of the standard 
graph isomorphism problem. The general graph isomorphism problem has been the 
subject of a large amount of work in combinatorics and computer science - |RC77] 
already speaks of the “graph isomorphism disease” - and, in particular, there are 
many results on reconstructing the isomorphism type of a graph from the isomor¬ 
phism types of subgraphs of various sorts (see, for example, the review |Bon91) l. 
There is also a substantial volume of somewhat parallel research on graph isomor- 
ohism in comoutational chemistrv Isee. for examole. |Diu m for a review). There 
seems to be considerably less work on determining isomorphism (in the obvious 
sense) of edge-weighted graphs; of course, in order for two edge-weighted graphs 
to be isomorphic the underlying combinatorial graphs must be isomorphic, but 
this does not imply that the best way for checking that two edge-weighted graphs 
are isomorphic proceeds by first determining whether the underlying combinatorial 
graphs are isomorphic and then somehow testing whether some isomorphism of the 
combinatorial graphs is still an isomorphism when the edge-weights are considered. 

We begin with a discussion of previous results that address various aspects of 
the problem of determining when two edge-weighted or combinatorial trees are 
isomorphic. 

A result in |Bed74| gives the following criterion for a bijection a : V(T') ^ 
V(T"), where T' and T" are combinatorial trees, to be an isomorphism: 
if ziq, 'Cl, ..., Vm is any sequence from V(T') U V(T") such that Vq = Vm and 

{ci, Vj} G E(T') U E(T") U {{m, cr(u)} : u G V(T')} i — j = ±1 mod m, 
then m = 4. 

The above result is elegant, but, of course, one does not need to apply it to all 
possible bijections to determine whether two combinatorial trees are isomorphic: 
there is a much more explicit and efficient procedure, which we now describe for 
the sake of completeness. First of all, suppose that T' and T" have distinguished 
vertices p' and p” and, in addition to the requirements in the above definition of an 
isomorphism a, we require that a maps p' to p"; that is, we have rooted trees and 
we require that an isomorphism maps the root of one tree to the root of the other. 
The presence of a root allows us to think of a combinatorial tree as a directed graph, 
where the head of an edge is the vertex that is closer to the root and the tail is 
the vertex farther from the root. The children of a vertex are the adjacent vertices 
that are farther from the root and, more generally, the descendants of a vertex u 
are those vertices v such that the path from the root to v passes through u. The 
subtree spanned by a vertex u and its descendants contains no other vertices and 
can be thought of as a combinatorial tree rooted at u, and we call this subtree the 
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subtree below u. Then, two rooted, combinatorial trees T' and T" are isomorphic 
if the two roots have the same number of children, say m, and there is an ordering 
of these children for each tree such that the subtree below the child of the root of 
T' is isomorphic (as a rooted, combinatorial tree) to the subtree below the child 
of the root of T". This observation can be turned into an efficient algorithm (see, 
for example, [AHU75I Example 3.2]). Now, two combinatorial trees are isomorphic 
if there is some choice of roots such the resulting rooted, combinatorial trees are 
isomorphic. A center of a combinatorial tree is a vertex c such that 

max r^lc^v) = min max rT(u,u), 
i;GV(T) mGV(T) «GV(T) 

where rT^(u,v) is the number of edges in he unique path between u and v for 
u,v € V(T), and a combinatorial tree has either a unique center or two centers 
that are adjacent. It is therefore possible to determine if two combinatorial trees 
are isomorphic by rooting each of them at their various centers and checking if any 
two such rooted, combinatorial trees are isomorphic. 

We, however, are interested in whether there are “statistics” of a more numer¬ 
ical character that can be used to decide tree isomorphism. For combinatorial 
trees, one somewhat obvious possibility is the multiset of eigenvalues of some ma¬ 
trix associated with the tree such as the adjacency matrix or the distance matrix. 
Unfortunately, the results of |Sch731 IBM931ISF831IFGM971 IMElll IBES12) show 
that not only is the isomorphism type of a tree not uniquely determined by the 
spectrum of its adjacency matrix but for various ensembles of combinatorial trees 
if one picks a tree uniformly at random from those in the ensemble with n vertices, 
then the probability there is another tree in the ensemble with an adjacency matrix 
that has the same spectrum converges to one as n —)■ c». The results of [ME11| can 
be used to show that an analogous phenomenon is present when one considers the 
spectrum of the matrix of leaf-to-leaf distances. 

Two trees have adjacency matrices with the same spectrum if and only if the 
characteristic polynomials of the adjacency matrices are equal. Given some irre¬ 
ducible representation of the symmetric group on the number of letters equal to 
the dimension of a square matrix, the immanantal polynomial of the matrix is 
constructed in the same manner as the characteristic polynomial except that the 
determinant is replaced by a similarly defined object for which the sign character is 
replaced by the character of the representation. One might hope that the immanan¬ 
tal polynomials are more successful at deciding isomorphism of combinatorial trees, 
but a result of [BM93| shows that if the adjacency matrices of two combinatorial 
trees have the same characteristic polynomials, then they have the same immanan¬ 
tal polynomials for every irreducible representation. We note that |Tur68| already 
contains an example of two combinatorial trees with adjacency matrices that are 
explicitly shown to have the same immanantal polynomial. 

The greedoid Tutte polynomial of a combinatorial tree T encodes for each i 
and £ the number of subtrees of T that have i internal vertices and £ leaves. It 
was conjectured in |GMOY95] that this information identifies the isomorphism 
type of a combinatorial tree. However, it was shown in |EG06j that there are 
infinitely many pairs of nonisomorphic caterpillars that share the same greedoid 
Tutte polynomial: a caterpillar is a combinatorial tree that consists of some number 
of internal vertices along a single path and leaves that are each adjacent to one of 
the internal vertices. This contrasts with the situation for rooted, combinatorial 
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trees; it is shown in [GM89| that there is a two-variable polynomial defined for all 
rooted, directed graphs (and hence, in particular, for rooted, combinatorial trees) 
such that two rooted, combinatorial trees have the same polynomial if and only if 
they are isomorphic. The polynomial in |GM89) is defined recursively, but it is not 
hard to see that it encodes in a compact manner the total number of vertices in 
the tree, the number of children of the root, the number of vertices in each of the 
subtrees below the children of the root, and so on. 

The chromatic symmetric function of a graph was introduced in [Sta95) . A 
proper coloring of a finite graph is a function k from the vertices of the graph to 
N such that adjacent vertices are assigned different values. We can introduce an 
equivalence relation on the proper colorings by declaring that two colorings k' and 
k" are equivalent if there is a bijection tt : N —)■ N such that k" = tt o k'. For a 
graph with m vertices, each equivalence class gives rise to a partition Ai > A 2 > 

... > Afc > 0 of TO by taking, for any k in the equivalence class, Ai to be the i*'' 
largest of the cardinalities : k{v) = j} as j ranges over N. The chromatic 
symmetric function encodes for each partition of to the number of equivalence 
classes of colorings that give rise to that partition. It was conjectured in [Sta95] that 
nonisomorphic combinatorial trees have distinct chromatic symmetric functions. It 
was shown in [MMW08] that this conjecture is true for caterpillars and that paper 
also reports on computational results verifying that the conjecture holds for the 
class of trees with at most 23 vertices. Further work related to the conjecture for 
the special case of trees with a single centroid is contained in |OS14] . 

Our point of departure in this paper is the well-known fact |Zar651 ISP691 IBun711 
IBun74) that an edge-weighted tree can be reconstructed from its matrix of leaf-to- 
leaf distances (see [Fel04j for an indication of the importance of this observation in 
the statistical reconstruction of phylogenetic trees). In fact, an edge-weighted tree 
with n leaves can be reconstructed from the collection of total lengths of subtrees 
spanned by all subsets of to leaves provided n > 2m — 1 |PS04] . We remark that 
the total length of the subtree spanned by a set of leaves is an important quantity 
in phylogenetics where it is called the phylogenetic diversity of the corresponding 
set of taxa |HS07| . 

Given these results, one might imagine that the multiset of leaf-to-leaf distances 
suffices to identify the isomorphism type of an edge-weighted tree. This is certainly 
not the case. For example, consider the two combinatorial caterpillars T' and T" 
with 28 leaves each, where T' has 3 internal vertices a\b\ c' in order along a path 
that are adjacent respectively to 2,11,12 leaves, and T" has 3 internal vertices 
a", 6 ",c" in order along a path that are adjacent respectively to 3,14,8 leaves. 
Taking the (^2^) pairs of distinct leaves in T', we see that the distance 2 appears 
(2) (2^) (2^) “ times, the distance 3 appears 2 x 11-1-11 x 12 = 154 times, and 

the distance 4 appears 2 x 12 = 24 times. Similarly, taking the (^ 2 ^) pairs of distinct 
leaves in T", we see that the distance 2 appears (2) -I- (2^) + (2) = 3 -f 91 -I- 28 = 122 
times, the distance 3 appears 3 x 14 -|- 14 x 8 = 154 times, and the distance 4 
appears 3 x 18 = 24 times. Probabilistically, we have just shown that if we pick 
two leaves uniformly at random without replacement from an edge-weighted tree, 
then the isomorphism type of the tree is not uniquely identified by the probability 
distribution of the distance between the two leaves. 

Note in this last example that if we looked at the multisets of lengths of subtrees 
spanned by three leaves, then we would see the length 3 appearing -|- ( 3 ^) = 335 
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times for T' and ( 3 ) + + ( 3 ) =421 times for T", and hence the probability 

distribution of the length of the subtree spanned by three leaves chosen uniformly 
at random is not the same for the two trees. 

In order to proceed further, we need to introduce some more notation. Write 
L(T) for the set of leaves of an edge-weighted tree T. Given a subset K of L(T), 
let WT(If) be the length of the subtree spanned by K; that is, 'Wt{K) is the sum 
of the lengths of the edges in the smallest connected subgraph of T with a vertex 
set that contains K. 

It is possible to calculate the total length of T, that is, Wt(L(T)), using the 
following result from |SS04] that extends one for the special case of 3-valent trees 
in |PauOO) . Write dT{v) for the degree of an interior vertex u of T (that is, v G 
V(T) \ L(T)). For distinct leaves x,y G L(T) denote by It{x, y) the set of interior 
vertices on the unique path in T between x and y and put 

hT{x,y):= {(dTiv) - 

vGlT(x,y) 

Let rT{x,y) be the sum of the lengths of the edges in the path between x and y. 
Then, 

Wt(L(T)) = ^ hT{x,y)rT{x,y). 

{x,y}<Zl^(T),x^y 

Of course, a similar formula gives Wx(if) for any K C L(T); the path between a 
pair of leaves of the subtree is the same as the path between them in T, the length 
of this path is the same in the subtree as it is in T, but the degree of an interior 
vertex of the subtree can be less than its degree as an interior vertex of T. 

Suppose that #L(T) = n and Yi,...,Yn is a uniformly distributed random 
listing of L(T); that is, Yi,..., W is the result of sampling the leaves of T uniformly 
at random without replacement. Set Wk '■= Wx({Yl, ..., Yfe}) for 2 < k < n; that 
is, the random variable Wk is the length of the subtree spanned by the first k of the 
randomly chosen leaves. We write Wx for the (n — 1 (-dimensional random vector 
(W 2 ,..., Wn) and call this random vector the random length sequence of T. 

In this paper we address the following question. 

Question 1.1. Can we reconstruct the edge-weighted tree T up to isomorphism 
from the joint probability distribution of the random length sequence Wx ? 

Another way of framing this question is the following. Write yi,... ,y„ for the 
leaves of T and let Jr be the multiset with cardinality nl that results from listing 
the (n — l)-dimensional vectors 

(Wx({ 2 / 7 r(l)) y-x{ 2 ) }) ) 'Wt ({ 2 /' 7 r(l)) 2 /'n-( 2 )) 2 / 7 r( 3 ) }))•■■> "Wx ({ 2 / 7 r(l) > • ■ ■ > y-x{n) })) 
as TT ranges of the permutations of [n] n}. We stress that Jx is a multiset; 

that is, we do not know which increasing sequences of lengths go with which ordered 
listings of the leaves. 

Question 1.2. Can we reconstruct the edge-weighted tree T up to isomorphism 
from the multiset of length sequences Jt: ? 

We end this section with some remarks about the problem of reconstructing trees 
from various so-called decks, as this subject has some similarities to the questions 
we consider. In [Ula60| . Ulam asked whether it is possible to reconstruct the iso¬ 
morphism type of a graph with at least 3 vertices from the isomorphism types of 
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the subgraphs obtained by deleting each of the vertices. This question was resolved 
in the affirmative for combinatorial trees in |Kel57| . Moreover, later results estab¬ 
lished that it is not necessary to know the forests obtained by deleting every vertex. 
For example, it was shown in |HP66| that it suffices to know the subtrees obtained 
by deleting leaves. This latter result was strengthened in |Man70| . where it was 
found that it is only necessary to know which nonisomorphic forests are obtained 
and not what the multiplicity of each isomorphism type is, and in |Bon69) . where 
it was shown that it suffices to take only those leaves p that are peripheral in the 
sense that 

max rxfp,u)= max max rT^(u,v). 

«GV(T) veV(T) veV(T) 

Along the same lines, it was established in [Lau83) that it is enough to take only the 
nonleaf vertices, provided that there are at least three of them. The line of inquiry 
in [KS85| is the most similar to ours: an example was presented of two trees for 
which the respective sets of vertices may be paired up in such a way that for each 
pair the sizes of the trees in the forests produced by removing each element of the 
pair from its tree are the same, and a necessary and sufficient condition was given 
for a tree to be uniquely reconstructible from this sort of data, which the authors 
of |KS85] call the number deck of the tree. 

1 . 2 . Overview of the main results. We will answer Question |l.l| in the affirma¬ 
tive for a few different classes of trees. Some classes will have general edge-weights 
and some classes will be combinatorial trees. It is clear that in the case of general 
edge-weights we must restrict to trees that have no vertices with degree 2 because 
otherwise we can subdivide any edge into arbitrarily many edges with the same 
total length and the joint probability distribution of the random length sequence 
will be unchanged - see Figure [ lT| We call such trees simple. The terms irreducible 
or homomorphically irreducible are also used in the literature. 



2 




1 



1 



Figure 1.1. Two non-isomorphic edge-weighted trees that can¬ 
not be distinguished by the joint probability distribution of their 
random length sequences. 

Our first result is for the the class of stars; that is, edge-weighted trees with 
n > 3 leaves that have a single interior vertex. Note that such trees are simple. 
For any edge-weighted tree with n leaves, Wn is a constant (the total length of the 
tree) and Wn — Wn-i is a uniformly distributed random pick from the lengths of 
the n edges that are adjacent to one of the leaves. The following simple result is 
immediate from this observation. 
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Theorem 1.3. For n > 3 the isomorphism type of a star is uniquely determined 
by the joint probability distribution of its random length sequence. 

The simple trees with two leaves all consist of a single edge and have a random 
length sequence ( 11 ^ 2 ), where W 2 is the length of that edge, and so the isomorphism 
type of such a tree is uniquely determined by the joint probability distribution of its 
random length sequence. The simple trees with three leaves are stars, and it follows 
from Theorem |1.3| that the isomorphism type of such a tree is uniquely determined 
by the the joint probability distribution of its random length sequence. 

We next consider simple, edge-weighted trees with four leaves. 

Theorem 1.4. For 2 < n < 4, the isomorphism type of a simple, edge-weighted 
tree T with n leaves is uniquely determined by the joint probability distribution of 
its random length sequence. 

The proof of this result is via consideration of possible cases. Similar proofs 
could be attempted for larger numbers of leaves, but the main reason we include 
the result is to show how such a proof for even a small number of leaves leads to 
quite a few cases and because we will need the case of four leaves later. 

It is well-known that any simple, combinatorial tree with labeled leaves can 
be reconstructed from the simple, combinatorial trees spanned by each subset 
of four leaves (the so-called quartets) |SS031 Theorem 6.3.7]. With this and 
Theorem |1.4| in mind, one might imagine that the isomorphism type of simple, 
edge-weighted tree can be determined from the joint probability distribution of 
{W 2 ,W 3 ,W 4 ). However, putting such a strategy into practice would seem to be 
rather complicated because there can be two sets of leaves {y), 2 / 2 : 2 / 3 ; 2 / 4 } 
{yi.y'i.v'Ly'i} such that {yi,y 2 >y 3 :yU 7 ^ {yi^y'i.yz.y'l} but WT({yi,y 2 }) = 
WT({yi,y 2 }): WT({yi,y2>y3}) = WT({yi ,^ 2 .^ 3 }). and WT({yi,y2>y3:2/U) = 
WT({yi, y 2 ^ 2 / 3 ^ 2 / 4 })- Oue way of ruling out such annoying algebraic coincidences 
is to assume that the edge-weighted tree T has edge-weights in general position, by 
which we mean that the sums of the lengths of any two different (not necessarily 
disjoint) subsets of edges of T are not equal. 

Theorem 1.5. The isomorphism type of a simple, edge-weighted tree T with edge- 
weights in general position is uniquely determined by the joint probability distribu¬ 
tion of its random length sequence. 

The last family of edge-weighted trees with general edge-weights whose elements 
we can identify up to isomorphism from the joint probability distributions of their 
random length sequences is the class of ultrametric trees. For the sake of com¬ 
pleteness, we now define this class. Recall that for leaves i,j G L(T) we denote by 
ut(*, j) the distance between them; that is, tt)*, j) is the sum of the lengths of the 
edges on the unique path between i and j. The edge-weighted tree T is ultrametric 
if for any leaves i,j, k G L(T) we have 

rT(i,k) < rT(i,j) V rT(j,k), 

from which it follows that for any leaves i,j, k G L(T) at least two of the distances 
^T(bi), rT(i,k), and rT(j,k) are equal while the third is no greater than that 
common value. Equivalently, an edge-weighted tree T is ultrametric if, when it is 
thought of as a real tree (that is, a metric space where the edges are treated as real 
intervals of varying lengths given by their edge-weights - see, for example, |Eva08p . 
then there is a (unique) point p called the root (which may be in the interior of 
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an edge) such that the distance from p to a leaf is the same for all leaves. We will 
make use of both definitions. It is immediate from the former definition that the 
subtree of an ultrametric tree spanned of a subset of leaves is itself ultrametric. 


Theorem 1.6. The isomorphism type of an ultrametric, simple, edge-weighted tree 
T is uniquely determined by the joint probability distribution of its random length 
sequence. 


Remark 1.7. The proof of Theorem |1.6| establishes an even stronger result. Namely, 
the isomorphism type of an ultrametric, simple, edge-weighted tree T is uniquely 
determined by the minimal element of 77 t in the lexicographic order. 


Remark 1.8. We call attention to a subtle point in the statements of Theorem 1.5 


and Theorem 1.6 Both results say that if we are given the joint probability dis¬ 
tribution of the random length sequence of an edge-weighted tree T - information 
that certainly includes the number of leaves of T - and we know, a priori, that T 
has a certain extra property (edge-weights in general position or ultrametricity), 
then we can determine the isomorphism type of T. The theorems do not, however, 
say whether it is possible to determine from the joint probability distribution of 
its random length sequence whether a simple, edge-weighted tree T has its edge- 
weights in general position or is ultrametric. We do not have results that settle 
this question, but we say some more about it in Equation (4.1) and believe it is an 
interesting area for future research. 


Observe that if T is an edge-weighted tree, a is any vertex of T, and c is a 
constant such that c > max{rT{a,i) : i € L(T)}, then fx : L(T) x L(T) —)■ IR+ 
defined by 

tt(*, j) := c-h ^(rT(i,j) - rT(a,i) - rT{a,j)), i j, 


and 


rT{i,i) ■■= 0 , 


is an ultrametric on L(T) that arises from suitable edge-weights on T. The metric 
rx is often called the Farris transform of rx - see |DHM07j for a review of the 
many appearances of this object in various areas from phylogenetics to metric 
geometry. It might be hoped that an affirmative answer to |1.1| for general edge- 
weighted trees will follow from Theorem |1.6| However, we have been unable to 
find an argument which shows that the joint probability distribution of the random 
length sequence of the tree T equipped with the new edge-weights is determined 
by the joint probability distribution of the random length sequence for the original 
edge-weights. 

Suppose that T is a rooted, simple combinatorial tree with root p. We can dehne 
a partial order on V(T) by declaring that that a; < y if x is on the unique path 
from p to y. Two vertices x,y S V(T) have a unique greatest lower bound in this 
partial order that we write as x Ay and call the most recent common ancestor of x 
and y. The map rx : L(T) x L(T) —>■ IR_|_ defined by 


rT{i,j) ■■= #{k e L(T) : i A j < k) 


is an ultrametric on L(T) and hence it arises from a collection of edge-weights 
Wx on T. A directed edge {x,y) in T with x < y is necessarily of the form 
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x = iAj = iAk and y = j A fc for some i,j,k G L(T). If e = {x,y} is the 
corresponding undirected edge, then 

WT(e) = ^{fTii,j)-fT{j,k)) 

= i(#{£ G L(T) : a: <£}-#{£ G L(T) : y < £}). 

Therefore, if T' is a subtree of T spanned by some set of leaves K C L(T) and 
D(T') is the set of directed edges of T', then we have that the length of T' is 

= ^ ^ I ^ t{x<i}-l{y<i} 

(x.y)GD(T') yGL(T) 

= e d(t') X L(T) ■.x<e,y^e}. 

The following result is immediate from Theorem |1.6| and 

Corollary 1.9. The isomorphism type of a simple, combinatorial tree T is uniquely 
determined by the minimal element of the set Jt of length sequences obtained after 
designating a root for Tl and equipping T with the edge-weights Wt- 

We now turn our focus to combinatorial trees and drop the assumption of sim¬ 
plicity. That is, all edge-weights are equal to one and there may be vertices with 
degree two. We answer Question [T^ in the affirmative for two families of combina¬ 
torial trees. 


1.7 



Figure 1.2. A caterpillar tree. Removing the leaves (white ver¬ 
tices) results in a path of length 5 (black vertices). 

First, a combinatorial tree T is a caterpillar li the deletion of the leaves along with 
the edges adjacent to them results in a path with l-\- \ vertices (and hence ^ edges) - 
see, for example. Figure [F^ Choose some direction for the path and number from 
0 to £ the vertices on the path encountered successively in that direction and write 
Hi for the number of leaves adjacent to the vertex numbered i. Note that no > 1 
and nt > 1. Two sequences rig ,... ,n'^, and Uq, .. ., n",, correspond to isomorphic 
trees if and only if I' = £” = £, say, and either n' = n", 0 < i ^ or n' = n'f_^, 
0<i<t 

Theorem 1.10. The isomorphism type of a caterpillar is uniquely determined by 
the joint probability distribution of its random length sequence. Furthermore, it is 
possible to determine from the joint probability distribution of the random length 
sequence of a combinatorial tree whether the tree is a caterpillar. 
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Our final results are for the classes of (unrooted) (fc + 1)-valent and rooted k-ary 
combinatorial trees. For k > 2, a (fc+l)-valent combinatorial tree is a combinatorial 
tree for which all vertices have degree either fc + 1 (the internal vertices) or 1 (the 
leaves). For fc > 2, a rooted fc-ary combinatorial tree is a combinatorial tree for 
which one internal vertex (the root) has degree fc and the remaining internal vertices 
have degree fc + 1; the leaves, of course, have degree 1. When fc = 2 we refer to a 
rooted 2-ary combinatorial tree as a rooted binary combinatorial tree. Attaching an 
extra vertex via and edge to the root of a rooted fc-ary tree produces a (fc + l)-valent 
combinatorial tree. 

Theorem 1.11. The isomorphism type of a {kl)-valent combinatorial tree (re¬ 
spectively, a k-ary tree) is uniquely determined by the joint probability distribution 
of its random length sequence. 

In fact, our proof of Theorem [TTT] leads us to a stronger conclusion. 

Theorem 1.12. Fix n > 1. Let F be a random (fc + l)-valent combinatorial 
tree (respectively, a random k-ary combinatorial tree) with n leaves. Then, the 
probability distribution of the isomorphism type ofT is uniquely determined by the 
joint probability distribution of its random length sequence. 


Note that in Theorem II. 121 there are two sources of randomness in the construc¬ 
tion of the random length sequence: we first choose a realization of the random 
T and then take an independent uniform random listing of the leaves to build the 
increasing sequence of subtrees and their lengths. 

The rest of the paper consists primarily of proofs of the above results in the order 
we have presented them. In Section we briefly discuss further open questions 
related to Question o 

2. Trees with up to n = 4 leaves: Proof of Theorem 11.41 


We begin by looking at Question |1.1| for edge-weighted trees with a small num¬ 
ber of leaves and give a proof of Theorem |1.4| that answers Question o in the 
afhrmative for general, simple edge-weighted trees with n = 2, 3 or 4 leaves. 

The case of Theorem |1.4| for simple trees with n = 2 leaves is trivial, as all such 
trees have two leaves and one edge, Wt = (IF 2 ) in this case, and W 2 is the length 
of the edge. 

The case of n = 3 leaves is only slightly more complicated, as all such trees are 
star-shaped. Thus, determining T from Wt consists of determining its three edge 
weights. These can be inferred easily from Wt by looking at the distribution of 
W 3 — W 2 , which, since W 3 is constant (equal to the total length of T), is distributed 
as a uniform random choice from the three edge weights. 

Finally, we give a proof of Theorem |1.4| in the case when n = 4. 


Proof. For n = 4 leaves, there are two possible simple combinatorial trees, and 
hence two possibilities for the shape of T. The first is the star-shaped tree with 
four edges and one interior vertex. The second is the 3-valent tree with two interior 


vertices and one interior edge. See Figure 2.1 


To determine which possibility T is, we first look at the distribution of W 4 — W 3 
to find the lengths of the four edges connecting directly to the four leaves. Call 
these edges pendent. If the sum of the four pendent edge lengths equals W 4 , then 
T is star shaped and we have determined T up to isomorphism. If not, then T is 
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Figure 2.1. The two possible simple combinatorial trees with n = 

4 leaves. 

3-valent and the difference between W 4 and the sum of the pendent edge lengths is 
the length e of the interior edge. All that is left to determine T up to isomorphism 
in this second case is determining how the pendent edges pair on each side of the 
interior edge. 

First, if the multiset of the lengths of pendent edges is of the form {a, a, a, a} or 
{a,a,a,b}, then T is already uniquely determined. 

Next, if the multiset is of the form {a, a, b, b}, then we need to distinguish between 
the case where the leaves with pendent edges of length a are siblings (and thus so 
are the leaves with pendent edge length b) and the case where leaves with pendent 
edge lengths a and b are paired. In the former case the possible values of W 2 are 
a + a,b + b,a + b + e with respective probabilities whereas in the latter 

case the possible values of W 2 are a + b,a + b + e,a + a + e,b + b + e with respective 
probabilities and we can certainly distinguish between the two cases. 

If the multiset of pendent edge lengths is of the form {a,a,b,c}, then there are 
the following two possibilities: 

(PI) the two leaves with pendent edge length a are siblings and the two leaves 
with pendant edge lengths b and c are siblings, in which case the possible 
values of W 2 are a + a,a + & + e, a + c + c, 6 + c with respective probabilities 

A _8_ A A- 

24 ’ 24 ’ 24 > 24 > 

(P2) a leaf with pendent edge length a is the sibling of the one with pendent 
edge length b and the other leaf with pendent edge length a is the sibling 
of the one with pendent edge length c, in which case the possible values of 
W 2 are a + &, a + e, a + a + e, a + 6 + e,a + c + c, 5 + c + e with respective 
probabilities 

Suppose without loss of generality that b < c. 11 a < b < c, then P{tF 2 = a + a} 
is ^ for (PI) and 0 for (P2). If 6 <a<c or 5<c< a, then P{IF 2 = a + c + e} is 
^ for (PI) and ^ for (P2). In all cases we can distinguish between (PI) and (P2). 

Finally, if the multiset of pendent edge lengths is of the form {a,b,c,d}, then 
there are the following two possibilities: 

(P3) the leaf with pendent edge length a is paired with the one with pendent 
edge length b and the leaf with pendent edge length c is paired with the 
one with pendent edge length d, in which case the possible values of W 2 are 
a + b,c + d,a + c+e,a + d + e,b + c+e,b + d + e with common probability 

A. 

24 > 

(P4) the leaf with pendent edge length a is paired with the one with pendent 
edge length c and the leaf with pendent edge length b is paired with the 
one with pendent edge length d, in which case the possible values of W 2 are 
a + c,b + d,a + b + e,a + d + e,b + c + e,c + d + e with common probability 

A. 

24’ 
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(P5) the leaf with pendent edge length a is paired with the one with pendent 
edge length d and the leaf with pendent edge length b is paired with the 
one with pendent edge length c, in which case the possible values of W 2 are 
a + d, b + c, a + c+e,a + b + e,c + d + e,b + d + e with common probability 

j_. 

24 ’ 

Suppose without loss of generality that a < b < c < d. Then possibility (P3) 
holds if and only if F{W 2 = a + 6} > 0 and possibility (P5) holds if and only if 
F{W 2 = a + b} = 0 and F{W 2 = b + d + e} > 0, so we can distinguish between 
(P3), (P4) and (P5). □ 

The argument in the proof of Theorem |1.4| seems rather ad hoc and it does not 
suggest a systematic approach to obtaining the analogous result for trees with an 
arbitrary numbers of leaves. The number of simple combinatorial trees with n 
leaves grows so rapidly with n (see, for example, [Fel04| 1 that even for trees with 
a relatively small fixed number of leaves a case-by-case argument seems rather for¬ 
bidding. Nonetheless, we do conjecture that an affirmative answer to Question o 
holds more generally. 


3. Trees in general position: Proof of Theorem 11.51 

Recall that the edge-weights of a simple, edge-weighted tree T are in general 
position if the sum of the lengths of any two distinct subset of edges of T are not 
equal. 


Proof. By assumption, if {y [,..., ?/(,} and {yf ,..., y'f.} are two subsets of L(T) such 
that WT({2/'i,...,2/fc}) = WT({yi,...,yfc}), then {y(,...,y(,} = {y",...,y"}. 
Consequently, if {y [,..., y(,} and {y",..., y'f} are two subsets of L(T) such that 
WT({yi, ■ ■ •, y'}) = WT({yi,..., y"}) for 2 < j < k, then {y'l, y^} = {y'/, y'f} and 
Vj = y" for S<j<k. 

Recall that Yi,..., are the successive randomly chosen leaves used in the 
construction of Wt = (^ 2 , • ■ • > VVn). 

Because Wn — Wn-i is the length of the pendent edge attaching Y„ to the rest 
of T, it follows that the set C := {£ > 0 : F{Wn — Wn-i = i} > 0} has n elements 
and F{Wn — Wn-i = ^} = ^ for each i £ P. There are at least two leaves of T that 
are siblings, and so there exist i', £” G C such that F{W 2 = £' + £”} > 0. Fix such 
a pair of lengths and write Xi and X 2 for the (unique) leaves of T with pendent 
edges having respective lengths P and We have F{W 2 =£'+£''} = and 


the event {W 2 = f + £"} coincides with the event {{^ 1 ,^ 2 } = {xi,X 2 }}- 

By assumption, the set D :={£> 0 : P{bF 3 — W 2 = £ \ W 2 = £' + > 0} 

has n — 2 elements and PjWa — W 2 = £ \ W 2 = £' + £''} = for each £ £ D. 
Index the values of D as £ 3 ,... ,£„ and write Xk, 3 < k < n, for the unique leaf 
of T that is distance £k from the unique vertex of T that is adjacent to both of 
the sibling leaves xi and X 2 . We will show that it is possible to determine the 
leaf-to-leaf distances r^^Xi^Xj), 1 < i, j < n. As we recalled in the Introduction, 
this information uniquely identihes the isomorphism type of T. 

Again by assumption, the set E :={£> 0 : P{IF 4 = £ \ W 2 = £' + £"} > 0} has 

("2 elements and P{IF 4 = £ | W 2 = £' + £"} = for each £ £ E. For a given 

1 2 J 

£ £ E there is a unique ordered pair (xi,Xj), 3 < i j < n, and a unique e > 0 
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such that 

P{W 3 -W2= W 4 -W 3 = £j-elW2 = f + £", W4 = £} > 0 

and 

Pjtyg -W2= £j, W4-W3 = £,-e\W2=£' + £'\ W 4 = €} > 0, 

in which case the two conditional probabilities in question are both £. Moreover, 
every ordered pair {xi^Xj), 3 < i ^ j < n, corresponds to some unique £ G E and 
e > 0 in this way. The event {W2 = £'+£", W3 — W2 = £i, W4 — W3 = £j—e, W4 = £} 
coincides with the event {{Yi,y2} = Y3 = Xi, Y4 = Xj} and the event 

{1^2 = £' Y £”, W3 — W2 = £j-,W4 — W3 = £i — e, W4 = coincides with the event 
{{Yi,Y 2} = {xi,X2}, Y3 = Xj, Y4 = Xi}. Considering the subtree of T spanned 
by {cci,a::2,ai3,2:4} and ignoring the vertices with degree two to produce a simple 
tree, the leaves Xi and xj are siblings in this simple tree (as are xi and 2:2), and 
the quantity e is the distance between the vertex in the subtree to which Xi and 
Xj are adjacent and the vertex to which xi and X 2 are adjacent; the lengths of 
the pendent edges connecting Xi and Xj to the rest of the subtree are £i — e and 
£j — e. Thus, if the ordered pair (xi,Xj) corresponds to £ G E and e > 0 , then, 
recalling the notation tt for the path length distance in T, rT(2;i,X2) = £' + £", 

rT{xi,Xi) =£' +£i, rTixi,Xj) =f + £j, rT{x2,Xi) = £" Y £%, rTix2,Xj) =£" Y£j, 
and rT2(xi, Xj) = £i Y £j — e. 

Therefore, the joint probability distribution the random length sequence Wt 
uniquely determines the matrix of leaf-to-leaf distances in T and hence the isomor¬ 
phism type of T. 

□ 


4. Ultrametric trees: Proof of Theorem 11.6 


Recall that Jr is the set of sequences {£ 2 , ■■■ ,£n) such that f‘{Wk = ^fc, 2 < fc < 
n} > 0. Write ^ for the usual lexicographic total order on J7 t (that is £' Y £" if 
in the first coordinate where the two sequences differ the entry of the £' is smaller 
than the entry of £”). Equivalently, £! Y £" if either £'2 < £'2 or ££2 = £'2 and for the 
smallest fc > 2 such that £'kj ^4 — £'k ^ £k+i — £k have — £'f. < £'1.^.4 — £'1- In 
this section we prove Theorem |1.6| by showing that that the tree T is determined 
up to isomorphism by the minimal element of J7 t. 

We use a similar technique (but with a different total order) to establish Theo¬ 
rem 1.11| for k Y 1-valent and rooted fc-ary combinatorial trees in Section 


Proof. Let (£ 2 ,^ 3 ,. ■ •,be the minimal element of Jr- Write Xi,X 2 ,.. ■ ,Xn for 
an ordering of L(T) such that £k = Wt({xi,X 2 ,..., Xk}) for k = 2,... ,n. 

We will establish by induction that for 2 < fc < n the ultrametric real tree 
spanned by the leaves {xi,X 2 ,..., Xk} can be reconstructed from (£ 2 , £ 3 , ■ • ■, £fc) 
and, moreover, if we adopt the convention that we draw ultrametric real trees in 
the plane with the root at the top and leaves along the bottom, then this particular 
real tree can be embedded in the plane with the leaves Xi,X 2 ,... ,Xk in order from 
left to right. 

The claim is certainly true when k = 2. Suppose the claim is true for 2,3,..., fc. 

Write Tfe for the ultrametric real tree spanned by {xi,X 2 ,... ,Xk} and denote 
the height of by h^, that is, hk is the common distance from each of the leaves 
of Tfe to the root pk of T^. We can, of course, suppose that T 2 C T 3 C ... C T„. 
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If Ik+i — > hk, then the ultrametric real tree spanned by 

{xi,X 2 , ■ ■ •, Xk, Xk+i} must consist of an arc of length 

llfc+l — ^k “t“ ^k) 

from the root pk+i of to the leaf Xk+i and an arc of length — £k — hk) 

from “new root” pk+i to the “old root” pk- In this case we can, by the inductive 
hypothesis, certainly embed in the plane with the leaf Xk+i to the right of 

the leaves xi,X 2 , ■ ■ ■ ,Xk- 

Assume, therefore, that ik+i — (-k < hk- Then the ultrametric real tree 
must consist of and an arc of length tk+i — joining Xk+i to a point ?/ G T^. 
It will suffice to show that y must be on the arc [pk,Xk] that connects pk to Xk 
because there is a unique ultrametric real tree consisting of and an arc of length 
^k+i — joining a new leaf to a point on the arc [pk, Xk] (this tree must have root 
Pk and the point where the arc of length ik+i — (-k attaches to [pk,Xk\ must be at 
distance hk — [(-k+i — ^k) from pk) and, moreover, such a tree can be embedded in 
the plane with the new leaf to the right of the leaves {xi, X 2 , ■ ■ ■ ,Xk}- 

Suppose, then, that y is not on the arc [pk,Xk]- Let j be the maximum of the 
indices i < k such that y is on the arc connecting Xi to pk- Write u for the point 
that is closest to Xj+i in the subtree spanned by {xi,X2, ■ ■ ■ ,Xj} and pk- Write 
V for the point that is closest to Xj+i in the subtree spanned by {xi,X 2 , ■ ■ ■ ,Xj}. 
Equivalently, v is the point in the subtree spanned by {xi,X2, ... ,Xj} that is closest 
to u. We may, of course, have u = v (which occurs if and only if /ij+i = hj). By 
the inductive hypothesis, u and v are on the arc connecting Xj to pk and 

rT{xj+i,u) + rT{u,v) = Ij+i - i^. 

By construction, y is the point closest to Xk+i in the subtree spanned by 
{xi,X 2 , ■ ■ ■ ,Xj} and pk- Write w for the point closest to Xk+i in the subtree 
spanned by {xi,X2, ■.. ,Xj}. Equivalently, w is the point in the subtree spanned by 
{xi,X 2 ,... ,Xj} that is closest to y. We have 

WT({a:i,.. .,Xj,Xk+i}) - ij = rT{xk+i,y) + rT{y,w). 

By the definition of j, the points y and u are on the arc connecting Xj to pk and 
rT{u,Xj) > rT[y,Xj). This implies that rT:{u,v) > rT{y,w). It also implies, by 
ultrametricity, that 

rT{xk+i,y) =rT{xj,y) < rT{xj,u). 

Consequently, 

({^1 5 • ■ ■ ; ^ + 1 ■ 

This, however, contradicts the minimality of (£ 2 , ■ • • ,£n)- Cl 

Remark 4.1. As we noted in |1.8[ it is interesting to know whether it is possible 
to determine from the joint probability distribution of the random length sequence 
whether an edge-weighted tree is ultrametric. The preceding proof of Theorem |1.6| 
contains a procedure for reconstructing T from the minimal element of in the 
lexicographic order when T is an ultrametric tree. If T is an arbitrary edge-weighted 
tree and this procedure is applied to the minimal element of J7 t in the lexicographic 
order, then it will still produce an ultrametric tree and so a necessary condition for 
T to be ultrametric is that the joint probability distribution of the random length 
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sequence of this ultrametric tree coincides with the joint probability distribution of 

Wt- 

Along the same lines, suppose that T is an arbitrary edge-weighted tree and, 
thinking of T as a real tree, we root it at the unique point p such that 


max rxfp, T) = r* 
'uGL(T) 


- max max rr£(u.v). 
2 mgl(t) «gl(T) 


Then p will have k children for some k. Let rrii, 1 < i < fc, be the number of leaves 
V in the subtree below the z*** child of p such that rT{p,v) = r*. It is clear that T 
is ultrametric if and only if mi -I- • • • -|- m/j = n. Let ni,... ,n( be a listing of the 
nonzero terms in the list mi ,..., Note for 2 < j < £ that 


f{W 2 = 2r*,...,Wj=jr*}=jl- 


',{n - 1 ) ■ ■ ■ {n-j + 1) 


E 


nhi 


■rih, 




and 

max{j > 2 : PjWs = 2r*, ...,Wj= jr*} >{)} = £. 

Thus, the joint probability distribution of Wt determines £ and the values of the 
elementary symmetric polynomials of degrees 2 < j < £ evaluated at ni,... ,ni, 
and we want to know whether ni -I- • • • -I- ni, the value of the elementary symmetric 
polynomial of degree 1 evaluated at rii,... ,n^, is n. The elementary symmetric 
polynomials of degrees 1,2,in £ real variables are algebraically independent 
over the reals, and so we cannot expect to recover ni -1- • • • -|- from the values 
of the other elementary symmetric polynomials. However, there are inequalities 
connecting the values of the various elementary symmetric polynomials that can 
be used to establish necessary conditions and sufficient conditions for T to be 
ultrametric. For example, set 

Pi ■.= j{ni-\ -h ni) 


and 


1 

'^hi ‘ ' ‘ 

JJ l<hi<...<hj<£ 

1 n(n — 1) • • • (n — j -|- 1) 




( 1 ) 


J 


F{W 2 = 2r*,...,Wj = jr*}, 2 < j < £. 


If ai,... ,a£ and Pi,..., (3i are positive constants such that 


ai + 202 H-+ £ae = Pi + 2 P 2 H- \-£Pi 


and 


(4.1) aj + 20^+1 -\- ■■■ + {£ — j l)ae, > Pj -|-2/3j_|_i — j -\-l)Pi, 2 < j < £, 

then, by |HLP88[ Theorem 77, Chapter II] 


Hu's n 




i=i 


Thus, if a 2 ,... ,ai and P 2 ,..., Pe. satisfy the inequalities (4.1) and 


7 - aj), 

i=2 
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then 

when 7 > 0, and the opposite inequality hold when 7 < 0. This observation leads 
to necessary conditions and sufficient conditions for T to be ultrametric. 

Remark 4.2. As we will see in Sect ion [ 6 . 4 [ for fc + l-valent and rooted /c-ary combina¬ 
torial trees, a somewhat similar proof argument based on the consideration of length 
sequences that are minimal with respect to a suitable order leads to a stronger re¬ 
sult in that case. There we can not only determine T from the joint probability 
distribution of its random length sequence, but if we have a random tree T with a 
fixed number of leaves, then it is possible to determine the distribution of T from 
the joint probability distribution of the random length sequence obtained by first 
picking a realization of T and then independently picking a random ordering of the 
leaves to build a random length sequence. 

Formally, we have some space T of isomorphism types of trees, a corresponding 
space S of possible length sequences, and a probability kernel /i from T to S, where, 
for T S T, iz(T, •) is the element of 7^(S), the space of probability measures on S, 
that is the joint probability distribution of the random length sequence built from 
T. An affirmative answer to [TTIfor a particular T means that the map T !—>■ i^(T, •) 
from T to V{S) is injective. Given an element /i of ’P(T), the space of probability 
measures on T, let fiv G V{S) be defined as usual by = Jr^v{T,B) ^{dT) 

for B C S. The stronger results obtained in Section [ 6 ^ say that, in the situations 
considered there, the map ^ i-G- fiu from V{T) to V{S) is injective. 

One can ask if an analogous strengthening is also true for ultrametric trees. 
A proof along the lines of that given for Theorem |1.12| doesn’t appear to apply 
immediately in this situation where the relevant space T is uncountable rather 
than finite. We leave this as one of many open questions. 


5. Caterpillar trees: Proof of Theorem [nni 


Recall that a caterpillar is a (not necessarily simple) combinatorial tree such 
that deleting the leaves of the tree results in a path consisting of £ -I-1 vertices (and 
hence £ edges of length 1 ). 


Remark 5.1. Choosing one end of the path, we can label the vertices on path 
consecutively with 0 , !,...,£ and denote by Ur the leaves that are attached to 
vertex r on the path. Both no and nt are non-zero, but the remaining rn may be 
zero. 

The isomorphism types of caterpillars with n leaves are thus seen to be in a 
bijective correspondence with equivalence classes of nonnegative integer sequences 
(no, ni,..., n£_i, n^), where n = no -I- • • • -I- n^ and no, n^ 7 ^ 0, and we declare that 
(no, ni,..., n£_i, n^) and (n^, n^-i,..., ni, no) are equivalent. 


The proof of the following, which establishes the first claim in Theorem 1.10 
straightforward and we omit it. 


IS 


Proposition 5.2. A combinatorial tree T with n leaves is a caterpillar with an 
associated path of length £ if and only if 

max{fc : P{W 2 = A: -I- 2} > 0} = £ 
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and Wn = i + n almost surely. 

We now turn to the proof of the main claim in Theorem |1.10| 

Proof. Consider a box with n tickets. Each ticket has a label belonging to 
{0,1,..., and there are Ui tickets with label i for 0 < i < £. Let Xi,X 2 ,..., Xn 
be the result of drawing tickets uniformly at random from the box without replace¬ 
ment and noting their labels. Set 


Kr := max Xj — min X,-. 

l<j<r ■' l<j<r 


It is clear that (IL 2 , IE 3 ,..., IT„) has the same joint probability distribution as 
{K 2 -I- 3, iLs -I- 3,..., Kn + n), and so it suffices to show that it is possible to de¬ 
termine {(riO) n-i,..., {ui, n^_i,..., rii, no)} from a knowledge of the joint 

probability distribution of /C := {K 2 ,... ,Kn) (that is, it is possible to determine 
up to a reflection the vector that gives the number of tickets with each label). 

To begin with, note that, as in |5.2[ 

maxjA: : V{K 2 = A:} > 0} = £, 

and so we can determine £ from the joint probability distribution of K,. 

Observe next that 

P{K2 = £}= niXi,X2) e {(0, £), {£, 0)}} 

_ riQni 

n(n-l)’ 


and 

max{fc : P{itr 2 = 0,..., = 0, iffe+i = £} > 0} = ng V n^. 

We can thus determine the multiset {no,n^} and, in particular, no -|- ni. 

For 1 < r < I we have 

P{i^2 =r,K3= £} 

= P{(Xi,X 2 ,X 3 ) e {(0,r, £),{r,0,£),i£,£-r,0),i£-r,£,0)}} 
2no{nr + n£-r)ne 
n(n — l)(n — 2) ’ 

and so we can determine n^ -I- ne-r. If £ is even, then 

F{K2 = ^,K3=£} 

= P I (Xi, X 2 , X 3 ) G I (^0, ^, (^^, 0, ^, 0 ) , ^^, 0^ 11 

dnonin^ 
n(n — I)(n — 2) ’ 

and so we can determine ne. 
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^{K 2 = 0} = ^ P{Xi =r,X 2 = r} 

i^O 

n(n — 1) 
n{n — 1) 

and, for 1 < fc < £, 

e-k 

F{K 2 = fc} = X! A 2 ) e {(r, r + k), (r + k, r)}} 

r—0 

o 

^ 2-^r—O ^r'^r+k 

n{n — 1) 

We can therefore determine Er=o ''^r'nr+k for 0 < fc < £. 

We claim that we the information we have just derived suffices to determine 
{(no, ni,..., (ni, n^_i,..., ni, no)}. That is, if n},..., n} is a sequence 

with 

no H-h n^ = n'o H-+ n} = n, 

Tlr T Tl£—r — n^ Tlf}_^ 

ior 0 < r < £, and 

e-k e-k 

'y eirkir+k = y ' n,,n^_i_^ 

r —0 r—0 

for 0 < fc < ^, then either n^ = n} for 0 < r < £ or n^ = n}_,, for 0 < r < £. 

To see that this is so, introduce the Fourier transforms 

e 

g{z) 

k^O 

and 


g\z) :=5]n}e-'= 

k=0 

for z € C. These are entire functions that uniquely determine no,...,n^ and 
Uq, ..., n}. Note that 

e 

Y.ne-ke^^’^ = e^^^g{-z), 
fc =0 

and a similar formula holds for g'. It will thus suffice to show that either g{z) = g'(z) 
or g{z) = e^^^g'{—z) (equivalently, g'{z) = e^^^g{—z)). 

It follows from the assumption that 

e-k e-k 

eirklr+k = n^'^r+k 

r=0 r=0 
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for 0 < fc < ^ that if we define TV : Z —Z by 


N{j) 



otherwise, 


and dehne N' similarly, then 

^ N{r)N{j) = N'{r)N'{j) 

{r^j^Ti-.r—j—k} {r,j^'L:r—j—k} 


for all fc G Z and hence 

9 {z)g{-z) = g'{z)g'{-z) 

for all z G C. By Theorem 2.2 in |RS82| . there exist hnitely supported functions 
C -.'L ^'L and : Z —)■ Z such that if we set 


kez 

and 

i^iz) ■.= Y,D{k)e^^\ 

then 

g{z) = (j){z)tp{z) 

and 

g'{z) = (/)(z)V'(-2). 
It follows from the assumption that 


Tl’p TIi£ _ j' 


Tlj. Tl£_j, 


^OT 0 < r < i that 

g{z) + e^^^g{-z) = g'{z) + e^^^g\-z) 
for all z G C. Therefore, 


4 >{z)'4){z) + e“^(/)(-z)V'(-2:) = 4 >{z)')p{-z) + e''''^4>{-z)ip{z) 


and hence 

{(j){z) - e''^^(j){-z)){ijj{z) - tp{-z)) = 0 

for all z G C. Because the functions z i-A (j>{z) — z) and z hG i/T(z) — i^{—z) 

are both entire, we must have either that (j){z) = z) for all z G C or iIj{z) = 

i/’(—z) for all z G C. If (/)(z) = z) for all z G C, then 

g{z) = (j){z)'tl;{-z) = e*^V(-z)i/>(-z) = e*^^g(-z) 

and Tii = for 0 < z < £. If "tpiz) = ip{—z) for all z G C, then 

g'(z) = (f>{z)'tl>{-z) = (/i(z)-0(z) = g{z) 

and Tir = n'^ for 0 < r < £. □ 
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6. fc + 1-VALENT AND ROOTED fc-ARY TREES 

We now turn our focus to the cases of (fc + l)-valent and fc-ary trees. Recall that 
a {k-\-\)-valent tree is a tree with all vertices of degree either fc +1 or 1. For k > 2 a 
rooted k-ary tree is a tree with one vertex of degree k and the rest of degrees either 
fc + 1 or 1. We refer to the rooted 2-ary tree as a rooted binary tree. Note that any 
k-ary tree is obtained by removing one leaf of a suitable (fc -|- l)-valent trees. 

Our general proof methodology for these families of trees is similar to that used 
in Section|^for ultrametric trees. We first define a particular class of sequences that 
can appear as elements of J7 t (the down-split sequences) and a total order on such 
sequences. We then show that the minimal down-split sequence in jTt uniquely 
identifies T. 

The idea of the proof is the same for all k and depends on the following fact. 

Lemma 6.1. Let T be a {k -\- l)-valent tree or a rooted k-ary tree and let S be a 
subtree o/T. Then S is a rooted k-ary tree if and only if 

#E(S) = ^(#L(S)-1). 

Proof. Because S is a subtree of T, every interior vertex of S has degree at most 
fc -I- 1. Write di := #L(S), ^ 2 , • • ■, dfe+i for the number of vertices of S of degrees 
1, 2,..., fc -I- 1. We need to show that dj = 0 for 1 < j < /c — 1 and dk = 1, or, 
equivalently, that dfe = 1 and dk+i = 12^=2 (®) — di — 1 = #E(S) — di. 

This is in turn equivalent to showing that 

fe+i 

=fc + (fc + l)(#E(S)-di), 

3=2 

which, by the “handshaking identity” 

fc-i-i 

2#E(S)=^jd„ 

i=i 

becomes 

2#E(S) - di = fc + (fc + 1)(#E(S) - di) 
or, upon rearranging, 

#E(S) = ^(di - 1) = ^(#L(S) - 1). 

□ 

For simplicity of notation we present the details of the proof for the case of 
(unrooted) 3-valent trees and rooted binary trees (that is, k = 2). We end in 
Section [6.4| with a discussion of the extension to general k. 

6.1. 3-valent and rooted binary trees. Our proof of Theorem 1 1.11 1 begins with 
an analysis of random length sequences for marked (also known as planted) 3-valent 
trees. A marked S-valent tree (T, v) is an 3-valent tree T and a distinguished leaf 
V of T. We define the modified random length sequence W(t,u) of (T,u) to be the 
random length sequence Wt of T conditioned on Yi = v. 
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6.2. Down-split sequences. We need to distinguish some particular sequences 
that appear in the support of W(t,,))- 


Remark 6.2. As usual, we can define a partial order on V(T) by declaring that 
X precedes y if x y is on the path between v and y, and we can extend this 
partial order to a total order < such that if w, x, y, z are such that w and x are not 
comparable in the partial order but w < x, w precedes y in the partial order, and 
X precedes z in the partial order, then y < z. Such a total order corresponds to 
embedding T in the plane and listing the elements of V(T) in the order they are 
encountered as one walks around T starting from v. 

Suppose that v = yi < y 2 <■■■< yn is the ordered listing of L(T). Set 
Sk = Wt({?/i, •. ■ ,yk}), 2 < k < n. If Sfe = 2A: — 2, then the subtree spanned by 
the k leaves {yi ,..., j/^} has 2k — 2 edges and hence, by Lemma 6.1 this subtree is 
a binary tree. If we write o for the vertex adjacent to the marked leaf v, denote by 
u',u" the other two vertices adjacent to o, and suppose that v' < v", then it must 
be the case that {y 2 , • ■ • } = {y G L(T) : v' < y}. Write T' (respectively, T") 

for the subtree of T consisting of w and the vertices u such that v' (respectively, 
v") is on the path from o to u. The sequence (s^,..., s(j,) := {s 2 — f, ■ ■ ■, — 1) 

satisfies s'^ = Wt' (o, y 2 , ■ • •, Vk) for 2 < k < n’ = kg. The sequence (s 2 ^ • ■ ■; '■= 

{Sk ,+1 - {2ks - 2),..., Sn - {2ks - 2)) satisfies s'^ = Wt"(o, yfc,+i,. ■ ■, yfc,+fc-i) for 
2 < k < n" = n — kg + 1. 


Definition 6.3. A down-split sequence is an element of the class of increasing 
sequences of positive integers defined recursively as follows. The sequence 

s = (l) 

is a down-split sequence. 

A sequence s = (s 2 ,..., s„), n > 2, is down-split if 

{2 < /c < n: Sfe = 2 fc — 2 } 7 ^ 0 

and, setting 

kg = inf {2 < k < n: Sk = 2 k — 2 }, 

• (s 2 — 1 , ■ • • 5 Sfc, — 1 ) is down-split, 

• (sfc^+i — ( 2 ks — 2 ),..., s„ — ( 2 kg — 2 )) is down-split. 

The index kg is the splitting index of s. 

Example 6.4. For n = 3, the sequence s = (s 2 , S 3 ) = (2, 3) is a down-split sequence. 
Here kg = 2, (s 2 -l,..., Sfe^-1) = (1) and (sfc,+i-(2fcs-2),..., s„-(2fcs-2)) = (1). 

The following result is immediate from |6.2| 

Lemma 6.5. For every marked 3-valent tree (T,z;) there is at least one down-split 
sequence s with 

nw(T..) = s} > 0 . 


We record the following fact for later use. 

Lemma 6.6. If s = (s 2 ,..., s„) is a down-split sequence then s„ = 2n — 3. 
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Proof. This follows easily by induction. If s splits at kg, then, as 

(S2, • • ■, 4') = {Sk,+1 - {2ks -2),...,Sn- {2ks - 2)) 
is a down-split sequence with n' = n — ^2 -I- 1, we have by the inductive hypothesis 
that 

Sn — {2ks — 2) = 2{n — /cg -I-1) — 3 

and the claim follows. □ 


Example 6.7. Given any down-split sequence s, it is possible to reverse the argument 
in |6.2| and construct a marked 3-valent tree with a suitable total ordering on its 
vertices such that s is the corresponding down-split sequence. However, a marked 
3-valent tree (T,i;) is not uniquely identified by an arbitrary down-split sequence 
in the support of W(t.i;); as the example in Figure 6.1 shows. 




Figure 6.1. Two marked binary trees (T, v) and (T, v) with par¬ 
ticular realizations of the random selection of leaves. 


Write (Fi,..., F„) and (Fi,..., F„) for the random selections of the leaves of T 
and T. Suppose that the realizations are such that 1^ = F^ € T for 4 < /c < n 
and that these leaves of the subtree T appear in an order of the type discussed in 
|6.2[ The corresponding realizations for the modified random length sequences are 
equal. The common value (3,4,...) is a down-split sequence with splitting index 
3. Thus, two non-isomorphic marked 3-valent trees can have a common down-split 
sequence in the supports of their modified random length sequences. Note that the 
common down-split sequence resul ts from taking the leaves of T according to an 


order of the type described in 


6.2 


but this is not the case for T. 


With |6.7| in mind we see that it would be useful to have a way of recognizing 
down-split sequences in the support of W(T,t;) that result from realizations where 
the leaves are selected in an order that arises from a suitable total order on the 
vertices of T. The key is the following total order on down-split sequences. We 
re-use the notation ^ that was used in Section]^ for the lexicographic order. 


Definition 6.8. Define a total order ^ on the set of down-split sequences of a 
given length recursively as follows. Firstly, (1) ^ (1) does not hold. Next, let s,r 
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be down-split sequences indexed by {2,... ,n} with respective splitting indices kg 
and kr- Set 

s' = (s 2 - 1,.. .,Sfe^ - 1), r' = (r 2 - 1,.. - 1), 


and 

s" = (sfc,+i-(2fcs-2),..., s„-(2fcs-2)), r" = (rfe^+i-(2A:^-2),..., r„-(2fcr-2)). 
Declare that 

s ^ r 
if 

kg < kr 

or 

kg = kr and s' -< r' 
or 

kg = kr and s' = r' and s" ^ r". 


The next result follows easily by induction. 

Lemma 6.9. The binary relation -< is a total order on the set of down-split se¬ 
quences of a given length. 

Definition 6.10. The minimal down-split sequence for a marked 3-valent tree 
(T,u) is the minimal element (with respect to the total order -<) of the set 

{s down-split : P{W(t,«) = s} > 0}. 

We now proceed to establish some results that culminate in showing that (T, v) 
is determined by its minimal down-split sequence. 

Lemma 6.11. Let (T,z;) he a marked 3-valent tree, with modified random length 
sequence W(^t,v) = (W^ 2 , • ■ •, Wn) constructed from the random sequence of leaves 
(Yi,... ,Yn) with Yi = v. Denote by o the vertex adjacent to the marked leaf v and 
denote by v',v" the other two vertices adjacent to o. Write T' (respectively, T") 
for the subtree of T consisting of o and the vertices u such that v' (respectively, v") 
is on the path from o to u. Set 

m := inf{fc: P{lTfc = 2fc — 2} > 0}. 

Then Wm = 2m — 2 if and only if 

Y 2 ,...,Y^G L(T') and Y^+i,..., Y„ G L(T"), 

or vice versa. 


Proof. If Y2, ... ,Ym G L(T') and Ym+i, ■ ■ ■ ,Yn G L(T"), then the subtree spanned 
by {u, Y 2 , • ■ ■, Ym} consists of the leaf v adjoined to T' via an edge to the vertex o. 
This subtree is a rooted binary tree with root o. It follows from Lemma [6.1 [ that 
Wm = 2m — 2. 


For the other direction, assume that Wm = 2m — 2. By Lemma 6.1 the subtree 
S spanned by {v,Y 2 ,... ,Ym} is a rooted binary tree with m leaves. We have 
L(S) C L(T), v G L(S), and L(S) \ {u} C (L(T') \ {o}) U (L(T") \ {o}). We need 
to show that S consists of the leaf v adjoined to either T' or T" via an edge to the 
vertex o that is common to both T' and T". 

By the construction prior to the statement of Lemma 6.5 we know that 


m<#L(T')A#L(T") 
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and SO ifL(T')\{o} C L(S)\{i;}, then L(T")nL(S) = L(T")\{o}nL(S)\{i;} = 0 
and similarly with the roles of T' and T" reversed. 

We can rule out the possibility that L(S) intersects both L(T') and L(T") as 
follows. If L(T') n L(S) ^ 0 and L(T") n L(S) 7 ^ 0, then L(T') n L(S) must 
be a proper subset of L(T') \ { 0 } and L(T") n L(S) must be a proper subset of 
L(T") \ {o}. If L(T') n L(S) is a proper, nonempty subset of L(T') \ {o}, then 
S must have a degree 2 vertex that belongs to V(T') \ {o}, and similarly for T". 
However, S is a rooted binary tree and cannot have two or more vertices of degree 
2 . 

Finally, we need to rule out the possibility of L(S) \ {u} is a proper subset of 
L(T') \ { 0 } or L(T") \ {o}. However, if L(S) \ {n} is a proper subset of L(T') \ {o}, 
then S would have at least one degree 2 vertex in that belongs to V(T') \ { 0 } as 
well as the degree 2 vertex o, which contradicts s being a rooted binary tree. The 
same argument holds with T" in place of T'. □ 


Corollary 6.12. Let (T, v) be a marked 3-valent tree with modified random length 
sequenee W(t,d) = {W 2 , ■ ■ ■, Wn)- Then 

m := infjfc: ¥{Wk = 2A: - 2} > 0} 
is the splitting index for the minimal down-split sequence for (T,?;). 


Proof. If kg is the splitting index of any down-split sequence s in the support of 
W(T,„), then Sfc, = 2fcs — 2 by definition. Thus = 2kg — 2} > 0 and hence 

m < kg. 


On the other hand, let o, i;', u", T', T" be as in the statement of Lemma 6.11 


It follows from that result that m = #L(T') A #L(T"). By the construction in 
6.2 if TO = or the analogous one with the roles of T' and T" reversed if 

TO = #L(T"), we may construct a down-split sequence for (T,u) that has splitting 
index to. By the definition of the total order A, the splitting index for the minimal 
down-split sequence for (T, v) is at most to. □ 


Proposition 6.13. Let s be the minimal down-split sequence for a marked 3-valent 
tree (T, v). There is no other marked Z-valent tree for which s is the minimal down- 
split sequence. 


Proof. We will prove this by induction. The claim is clearly true for the down-split 
sequence s = (1). 

Let (T, v) be a marked 3-valent tree and s the minimal down-split sequence 
for (T,u). Define o,v',v",T',T" as in the statement of Lemma 6.11 
the splitting index of s. Let yi,... ,y„ 

W(T,.„)({?/i, • ■ ■,yk}) = Sk for 2 < k < n. 


Let kg be 

be an ordered listing of L(T) such that 
and Lemma 


By 


6.12 


6.11 


we must either 


have {?/ 2 , ...,ykA = L(T')\{o} and {yk,-ei, ■ • • ,2/n} = L(T") \{o| or the analogous 
conclusion with the roles of T' and T" interchanged holds (if #L(T') 7 ^ L(T"), 
then only one alternative is possible). We may suppose without loss of generality 
that the choice of v' and v" is such that the first alternative holds. 

Set 

s' := (s 2 - 1,..., Sk, - 1), s" := (sk.+i - {2ks - 2),..., - (2kg - 2)). 

By definition, s' and s" are down-split sequences. Because P{W(t,«) = s} > 0, we 
have 

P{W(T',o) = s'} > 0 
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and 

P{W(T".o) = s"} > 0. 

We claim that s' must be the minimal down-split sequence for (T',o). To see 
this, note that if there was a down-split sequence s' with s' ^ s' such that 

nw(T'.o) = s'} > 0, 

then, writing 

m := (to, ..., to) 

for a positive integer to we would have 

P{>V(T.„) = (S' -h T, s" -f 2ks - 2)1 > 0 

and, by definition of the total order 

(s' -h T, s" -h 2 ks - 2) ^ (s' -h T, s" -f 2 ks - 2) = s. 

This, however, contradicts the minimality of s. Similarly, s" is the minimal down- 
split sequence for (T",o). By induction, (T',o) and (T",o) are uniquely deter¬ 
mined. 

Since (T,i;) is obtained by gluing (T',o) and (T",o) together at the shared 
vertex o and attaching the marked leaf u to o by an edge, we see that (T, v) is also 
determined by s. □ 

While the proof of |6.13| is not in the form of an explicit reconstruction procedure, 
the argument clearly leads to an algorithm for building a marked 3-valent tree (T, v) 
from the corresponding minimal down-split sequence. Namely, (T, v) is simply the 
recursion tree that results from parsing s as a down-split sequence as in |6.3[ with 
leaves corresponding to edges that terminate in the sequence (1). 

o 



/ \ 

(1) (1) 


Figure 6.2. A marked 3-valent tree with its leaves ordered mini¬ 
mally and the corresponding parse tree for the minimal down-split 
sequence. 
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6.2.1. Proof of Theorem \l.ll\ and Theorem \l.lS\ From |6.13| we are able to easily 
prove Theorem 1.11 for (unmarked) 3-valent trees. 


Proof. Let T be a fixed (unknown) 3-valent tree with n leaves and let Wt be its 
random length sequence. Conditional on Yi, Wt is the modified random length 
sequence of the marked binary tree (T, Yi). Thus, if 

P{Wt = s} > 0, 


then there must be some leaf u G T such that 


nw(T.„) = 4 > 0. 

Let s* be the minimal element of the set 


{s down-split : PIWt = s} > 0}. 

Then s* must be the minimal down-split sequence for (T, v) for at least one leaf v 


of T. By 6.13 we can reconstruct (T,u) and hence T from s*. 


□ 


The above argument can be pushed further to prove Theorem 1.12 for T a 
random 3-valent tree. 


Proof. Let T be a random 3-valent tree with n leaves and random length sequence 
Wt. 

Given a 3-valent tree T with n leaves, let be the minimal element of the set 
of down-split sequences of the marked 3-valent trees (T,?;) as v ranges over L(T). 
We equip the set of 3-valent tree with n leaves with a total order that, with a slight 
abuse of notation, we denote by ^ by declaring that T' ^ T" if ^ . Note 

that if T' ^ T", then PIWt' = } > 0 and P{>Vt" = } = 0. Now, for each 

choice of T we have 

PjWr = s'^} = Yl = T'}P{>Vt' =s'^}= P{r = T'}P{>Vt' = 4}, 

T' T'AT 

and the conclusion that we can recover P{T = T} as T ranges over the 3-valent 
trees with n leaves follows simply from the observation that if 6 is a row vector of 
length N and A is an N x N matrix that has all entries below the diagonal zero 
and all entries on the diagonal strictly positive, then there is a unique row vector 
X of length N such that b = xA. □ 

6.3. Up-split sequences. We now prove Theorem [m] and Theorem |1.12| for 
rooted binary trees. Analogous to the objects we introduced for marked 3-valent 
trees, we begin with a definition of a class of sequences that will appear in the 
support of the random length sequence of a rooted binary tree. 

Definition 6.14. An up-split sequence is an element of the class of increasing 
sequences of nonnegative integers defined recursively as follows. 

The sequence 

s=(0) 

is an up-split sequence. 

a sequence s = (si,...,s„), n > 1, is an up-split sequence if 
{l<A:<n:sfc=2/c — 2}y^0 

and, setting 

ks '.= supjl < k < n: Sk = 2k — 2}, 
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• (si,..., SfeJ is an up-split sequence, 

• (sfe,+i — {‘2‘ks — 1),..., s„ — {2ks — 1)) is a down-split sequence. 
The index ks is the splitting index of s. 


Example 6.15. Suppose that T is a rooted binary tree with root o. In a manner 
similar to the construction in 6.2 we can order a partial order on V(T) by declaring 
that X precedes y if x ^ y is on the path between p and y, and we can extend this 
partial order to a total order < such that if w, x, y, z are such that w and x are not 
comparable in the partial order but w < x, w precedes y in the partial order, and 
X precedes z in the partial order, then y < z. Suppose that j/i < 1/2 < • ■ • < 2/n is 
the ordered listing of L(T). Set si := 0 and Sk ■= Wt({ 2 /i, .. • ,yk}), 2 < k < n. 
Then (si,..., s„) is an up-split sequences. The leaves ?/i,..., yk^ and yk^+i, ■ ■ ■ ,yn 
respectively span the two binary subtrees T' and T" that are rooted at the two 
children of the root o. The subtree spanned by o and ?/fe„+i,...,?/« is a 3-valent 
tree. 



Figure 6.3. A rooted binary tree split as a rooted binary subtree 
T' and a marked 3-valent tree (T",o). 


The following analogue of Lemma [6. 5 1 is clear from Figure |673| 

Lemma 6.16. For every rooted binary tree T, there is at least one up-split sequence 
s with 

P{(0,Wt) = s} > 0. 


The following analogue of Lemma [6.6| can be established using a similar inductive 
proof. 

Lemma 6.17. If s = (si,..., s„) is an up-split sequence then = 2n — 2. 

Definition 6.18. Define a total order ^ on the set of up-split sequences of a given 
length recursively as follows. Firstly, (0) <C (0) does not hold. Next, let s and r 
be two up-split sequences indexed by {1,..., n} with respective splitting indices kg 
and kr- Set 

s'= (si,...,Sfc), r'= {ri,... ,rk) 
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and 

s” = (sfc+i - (2fc - 1),..., - (2fc - 1)), r" = (rfc+i - (2fc - 1),..., n 

Declare that 


if 

or 

or 


s r 


kg ^ k^ 


kg = kr and 

s <^r 

and s' = r' 

and 


(2fc-l)). 


Remark 6.19. Note for up-split sequences s and r, that s r implies that the 
splitting index of s is greater than or equal to the splitting index of r. For down- 
split sequences u and t, u ^ t implies that the splitting index of u is less than or 
equal to the splitting index of t. This change in the direction of the inequalities 
matches the switch in the definition of the splitting index from an infimum for 
down-split sequences to a supremum for up-split sequences. 


The next result follows easily by induction. 

Lemma 6.20. The binary relation is a total order on the set of up-split sequences 
of a given length. 


Definition 6.21. The minimal up-split sequence for a rooted binary tree T is the 
minimal element (with respect to the total order of the set 

{s up-split : PIWt = s} > 0}. 

The up-split sequence analogues of Lemma |6.11| and |6.12 are the following and 
they are proved in essentially the same manner. 


Lemma 6.22. Given a binary tree T with root o, let T' and T" be the binary 
subtrees rooted at the two children of o. Set 

m := sup{l < fc < n: P{VFfe = 2fc — 2} > 0}. 

Then Wm = 2m — 2 if and only ifYi,..., Ym € T' and Y^+i,... ,Yn € T" or vice 
versa. 


Corollary 6.23. Let T be a rooted binary tree with random length sequence Wt = 
{W2,...,Wn). Then 

m := sup{l < k < n: PjVFfc = 2/c — 2} > 0} 
is the splitting index for the minimal up-split sequence for T. 

The following analogue of |6.13| for up-split sequences follows from Lemma 6.22| 
and |6.23|in essentially the same manner that |6.13| followed from Lemma |6.11| and 

Em 

Proposition 6.24. Let s be the minimal up-split sequence for a rooted binary tree 
T. There is no other rooted binary tree for which s is the minimal up-split sequence. 


Clearly, |6.24| completes the proof of Theorem |1.11[ To establish Theorem |1.12| 
in the case of T a random rooted binary tree, we need only repeat the argument of 
the proof of Theorem |1.12| given in Section [6]2?^ for 3-valent trees. 
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6.4. (fc + l)-valent and rooted fc-ary combinatorial trees. The proof of the 
extension Theorem 1.11 and Theorem 1.12 to (A: + l)-valent and rooted fc-ary combi¬ 
natorial trees for fc > 3 is very similar to the k = 2 case and involves the introduction 
of suitable notion of down-split and up-split sequences along with appropriate total 
orders on these sets of sequences. The only difference is that both types of split 
sequences are now split into k smaller sequences, instead of just two. We leave the 
details to the reader. 


7. Open problems 

The original conjecture Question o remains open in general, both for simple 
trees with arbitrary edge weights (not in general position), and for combinatorial 
trees. An even more general question is suggested by Theorem |1.12[ 

Question 7.1. LetT be a random tree with probability distribution supported either 
on the set of simple trees with n leaves and general edge weights or the set of com¬ 
binatorial trees with n leaves. Can the probability distribution of T he determined 
uniquely from the joint probability distribution of the random length sequence Wj- ? 


Even if the answer to Question 7.1 is “no”, the answer may still be “yes” if the 


probability distribution of T is known a priori to belong to some particular fam¬ 
ily of probability distributions. There are, of course, many families of probability 
models for with random trees with n leaves that are described by a small num¬ 
ber of parameters (for example, conditioned Galton-Watson models, the various 
preferential attachment models), and perhaps the value of these parameters can be 
determined from the joint probability distribution of the random length sequence 
of a random tree that is known a priori to be distributed according to a member 
of one of these families. 


Question 7.2. What are the necessary and sufficient conditions on a vector for 
there to be an edge-weighted tree T such that the vector is in the support of Wt ? 

We remarked in the Introduction that the focus of this paper is superficially 
similar to that in [KS85| , where the problem of reconstructing a combinatorial tree 
from its number deck (the sizes of the subtrees in the forests produced by deleting 
each vertex) was studied. The lists of lists that are the number deck of some 
combinatorial tree are characterized in [KEM86) . 

Question 7.3. Are there more parsimonious quantities derived from the joint prob¬ 
ability distribution of the random length sequence that still carry a lot of informa¬ 
tion about T ? For example, how much information about T is contained in the 
expectation (E[IT 2 ],... ,E[IT„]) of the random length sequence and is it possible to 
characterize those vectors which can arise as the expectation of the random length 
sequence? 
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